/*******************************************************************************
Replication Materials for Blau, Kahn, Brummund, Cook, and Larson-Koester "Is 
There Still Son Preference in the United States?" Journal of Population
Economics, forthcoming.

CPS Sample Construction and Variable Definitions

Date Modified: 11/06/2019

*******************************************************************************/


*** Regression File Construction

The CPS regression file is constructed by merging spouse, child, parent, and 
unmarried  partner characteristics to each CPS individual, creating one "wide" 
format dataset. Afterwards, country characteristic variables from auxiliary 
datasets are merged in to match to individuals, spouses, and parents from the
various source countries if the respective individual, spouse, or parent was
born in such country. Separate files are made for the March and June CPS analyses,
with March using years 1995 to 2014 and June using 2008, 2010, and 2012. We
include year, serial, and pernum variables to allow users to merge in additional
CPS variables.

Only children aged less than 18 are included as children. Children characteristic 
variables are numbered (e.g. c_age1, c_age2, etc.), with each number corresponding 
to the characteristics for one child. Child numbers are ordered from oldest to 
youngest child.

The main regression files are given a "_core" tag on to the file name. The files
ending with "_ext" are used for the extended sample robustness checks (this sample
is detailed in the data appendix). The only difference in these files is that the
"_ext" files include households with foster children. Other "extended sample" 
restrictions are made in the individual table code.

*** Regression File Sample Restrictions

The CPS regression files drop observations with a same-sex spouse or unmarried 
partner, drop individuals younger than 18 or older than 64, drop individuals 
living in group quarters, and drop individuals with two or more children born in 
the same year. Other sample restrictions are made in the table code.


*** Sample Defining Variables

head_samp : Indicator for an observation being the household head or the spouse
	of the household head.
core_samp : Indicator for observations that are women, ages 18 to 40, where the 
	oldest child is 12 or younger, where no children are foster children, where all
	children are born in the United States, and where no women are widows.
core_sampf : Equal to core_samp with additional restriction of women being married
	with a spouse present.
citizen : Categorical variable for born abroad to American parents (1), naturalized
	citizen (2), or not a citizen (3).
imm_gen : Categorical variable for first generation immigrant (1), second generation
	immigrant (2), or 3rd+ generation native (3)
gt0chld : Indicator for having at least one child.
secgen : Indicator for second generation immigrant
own_imm_mom : Indicator for mother immigrant 
own_imm_dad : Indicator for father immigrant
incountry : Indicator for availability of source country variables.


*** Other variables

hhwtnorm : Household weight variable normalized so that each sample year receives
	equal weight.
chld1 : Indicator for first born child being a girl.
femhdalt : Indicator for unmarried female with at least one child.
lths (sp_lths) : Indicator for (spouse) education less than high school
scol (sp_scol) : Indicator for (spouse) education of some college
cold (sp_cold) : Indicator for (spouse) education of a college degree or more
genrace  (sp_genrace) : Categorical variable for (spouse) race, including White
	(non-Hispanic) (1), Black (non-Hispanic) (2), Hispanic (3), Asian (non-Hispanic)
	(4), and Other (non-Hispanic).
region : Categorical variable for Census region and division (9 categories).
year : Survey year
age (sp_age) : (Spouse) age. age2 and age3 refer to age squared and age cubed.
nchild : Number of children.
sp_imm_mom : Indicator for spouse's mother immigrant 
sp_imm_dad : Indicator for spouse's father immigrant
m_bpld : Mother birth country
f_bpld : Father birth country


*** Parent Source Country Variables (See Data Appendix for Sources)

m_igdp00_07 : Mother source country GDP average
m_isexratio00_07 : Mother source country sex ratio at birth average (2000-2007)
m_ilfp00_07 : Mother source country labor force participation average (2000-2007)
m_ifert00_07 : Mother source country fertility average (2000-2007)
m_iscore00_07 : Mother source country Gender Gap Index (2000-2007)

f_igdp00_07 : Father source country GDP average
f_isexratio00_07 : Father source country sex ratio at birth average (2000-2007)
f_ilfp00_07 : Father source country labor force participation average (2000-2007)
f_ifert00_07 : Father source country fertility average (2000-2007)
f_iscore00_07 : Father source country Gender Gap Index (2000-2007)







